The first thing we always do when we start with our work is to import the libraries. You don’t need to import every single library that you’ll be using in the notebook right at the beginning itself. You can start with the bare minimum and those include:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("C:/Users/91852/Downloads/archive (1)/HR_comma_sep.csv")
Just the above are enough when you start working on a problem. From then on you can just add them at the start or you can just import them where ever you are in the notebook.
Now that we are done with importing let’s load the data.
df
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | Department | salary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.38 | 0.53 | 2 | 157 | 3 | 0 | 1 | 0 | sales | low |
| 1 | 0.80 | 0.86 | 5 | 262 | 6 | 0 | 1 | 0 | sales | medium |
| 2 | 0.11 | 0.88 | 7 | 272 | 4 | 0 | 1 | 0 | sales | medium |
| 3 | 0.72 | 0.87 | 5 | 223 | 5 | 0 | 1 | 0 | sales | low |
| 4 | 0.37 | 0.52 | 2 | 159 | 3 | 0 | 1 | 0 | sales | low |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 14994 | 0.40 | 0.57 | 2 | 151 | 3 | 0 | 1 | 0 | support | low |
| 14995 | 0.37 | 0.48 | 2 | 160 | 3 | 0 | 1 | 0 | support | low |
| 14996 | 0.37 | 0.53 | 2 | 143 | 3 | 0 | 1 | 0 | support | low |
| 14997 | 0.11 | 0.96 | 6 | 280 | 4 | 0 | 1 | 0 | support | low |
| 14998 | 0.37 | 0.52 | 2 | 158 | 3 | 0 | 1 | 0 | support | low |
14999 rows × 10 columns
df.head()
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | Department | salary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.38 | 0.53 | 2 | 157 | 3 | 0 | 1 | 0 | sales | low |
| 1 | 0.80 | 0.86 | 5 | 262 | 6 | 0 | 1 | 0 | sales | medium |
| 2 | 0.11 | 0.88 | 7 | 272 | 4 | 0 | 1 | 0 | sales | medium |
| 3 | 0.72 | 0.87 | 5 | 223 | 5 | 0 | 1 | 0 | sales | low |
| 4 | 0.37 | 0.52 | 2 | 159 | 3 | 0 | 1 | 0 | sales | low |
df.tail()
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | Department | salary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 14994 | 0.40 | 0.57 | 2 | 151 | 3 | 0 | 1 | 0 | support | low |
| 14995 | 0.37 | 0.48 | 2 | 160 | 3 | 0 | 1 | 0 | support | low |
| 14996 | 0.37 | 0.53 | 2 | 143 | 3 | 0 | 1 | 0 | support | low |
| 14997 | 0.11 | 0.96 | 6 | 280 | 4 | 0 | 1 | 0 | support | low |
| 14998 | 0.37 | 0.52 | 2 | 158 | 3 | 0 | 1 | 0 | support | low |
df.count()
satisfaction_level 14999 last_evaluation 14999 number_project 14999 average_montly_hours 14999 time_spend_company 14999 Work_accident 14999 left 14999 promotion_last_5years 14999 Department 14999 salary 14999 dtype: int64
df.shape
(14999, 10)
df.columns
Index(['satisfaction_level', 'last_evaluation', 'number_project',
'average_montly_hours', 'time_spend_company', 'Work_accident', 'left',
'promotion_last_5years', 'Department', 'salary'],
dtype='object')
df.groupby("left").mean()
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | promotion_last_5years | |
|---|---|---|---|---|---|---|---|
| left | |||||||
| 0 | 0.666810 | 0.715473 | 3.786664 | 199.060203 | 3.380032 | 0.175009 | 0.026251 |
| 1 | 0.440098 | 0.718113 | 3.855503 | 207.419210 | 3.876505 | 0.047326 | 0.005321 |
pd.crosstab(df.salary, df.left).plot(kind='bar')
<AxesSubplot:xlabel='salary'>
dependentDf = df[['satisfaction_level','average_montly_hours','promotion_last_5years','salary']]
dependentDf.head()
| satisfaction_level | average_montly_hours | promotion_last_5years | salary | |
|---|---|---|---|---|
| 0 | 0.38 | 157 | 0 | low |
| 1 | 0.80 | 262 | 0 | medium |
| 2 | 0.11 | 272 | 0 | medium |
| 3 | 0.72 | 223 | 0 | low |
| 4 | 0.37 | 159 | 0 | low |
Now that the data is loaded. Let’s take a look at how the data is, i.e, the data types, the no of NaN values etc., We can do that by using the .info() method
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14999 entries, 0 to 14998 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 satisfaction_level 14999 non-null float64 1 last_evaluation 14999 non-null float64 2 number_project 14999 non-null int64 3 average_montly_hours 14999 non-null int64 4 time_spend_company 14999 non-null int64 5 Work_accident 14999 non-null int64 6 left 14999 non-null int64 7 promotion_last_5years 14999 non-null int64 8 Department 14999 non-null object 9 salary 14999 non-null object dtypes: float64(2), int64(6), object(2) memory usage: 1.1+ MB
There seem to be no weird values in the data, which is very good. Let’s look at the metrics.
df.describe()
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | |
|---|---|---|---|---|---|---|---|---|
| count | 14999.000000 | 14999.000000 | 14999.000000 | 14999.000000 | 14999.000000 | 14999.000000 | 14999.000000 | 14999.000000 |
| mean | 0.612834 | 0.716102 | 3.803054 | 201.050337 | 3.498233 | 0.144610 | 0.238083 | 0.021268 |
| std | 0.248631 | 0.171169 | 1.232592 | 49.943099 | 1.460136 | 0.351719 | 0.425924 | 0.144281 |
| min | 0.090000 | 0.360000 | 2.000000 | 96.000000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 0.440000 | 0.560000 | 3.000000 | 156.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 |
| 50% | 0.640000 | 0.720000 | 4.000000 | 200.000000 | 3.000000 | 0.000000 | 0.000000 | 0.000000 |
| 75% | 0.820000 | 0.870000 | 5.000000 | 245.000000 | 4.000000 | 0.000000 | 0.000000 | 0.000000 |
| max | 1.000000 | 1.000000 | 7.000000 | 310.000000 | 10.000000 | 1.000000 | 1.000000 | 1.000000 |
In this, we will be visualizing the data. By doing this we will be able to further understand how the data is and if there is any work that is to be done.
import seaborn as sns
sns.set() #sets the style of the plot.
fig = plt.figure(figsize=(12,6)) #Used to display the plot
sns.barplot(x='Department', y='satisfaction_level', hue='salary', data=df, ci=None)
plt.title("Satisfaction_level Vs Department", size=15)
plt.show()
fig = plt.figure(figsize=(12,6))
g = sns.barplot(x='Department', y='last_evaluation', data=df, ci=None)
g.bar_label(g.containers[0])
plt.title("Last Evaluation Vs Department", size=15)
plt.show()
fig = plt.figure(figsize=(12,6))
sns.barplot(x='Department', y='number_project', data=df, ci=None)
plt.title("Number Project Vs Department", size=15)
plt.show()
fig = plt.figure(figsize=(12,6))
g = sns.barplot(x='Department', y='Work_accident', data=df, ci=None)
g.bar_label(g.containers[0])
plt.title("Work Accident Vs Department", size=15)
plt.show()
fig = plt.figure(figsize=(12,6))
g = sns.barplot(x='Department', y='satisfaction_level', hue='left', data=df, ci=None)
g.bar_label(g.containers[0])
g.bar_label(g.containers[1], rotation=90)
plt.title("Left Vs Department with satisfaction level", size=15)
plt.show()
fig = plt.figure(figsize=(12,6))
sns.lineplot(x='Department', y='time_spend_company', data=df, ci=None, color='r', marker='o')
plt.title("Time spent per each Department", size=15)
plt.show()
ig = plt.figure(figsize=(12,6))
sns.lineplot(x='Department', y='average_montly_hours', data=df, ci=None, color='g', marker='*',ms=20)
plt.title("Avg Hours spent in the company per Department", size=15)
plt.show()
fig = plt.figure(figsize=(12,6))
sns.lineplot(x='Department', y='promotion_last_5years', data=df, ci=None, color='black', marker='D',ms=15)
plt.title("Promotions of Last 5 years in the company per Department", size=15)
plt.show()
salary_dummies = pd.get_dummies(dependentDf.salary, prefix='salary')
salary_dummies
| salary_high | salary_low | salary_medium | |
|---|---|---|---|
| 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 |
| 3 | 0 | 1 | 0 |
| 4 | 0 | 1 | 0 |
| ... | ... | ... | ... |
| 14994 | 0 | 1 | 0 |
| 14995 | 0 | 1 | 0 |
| 14996 | 0 | 1 | 0 |
| 14997 | 0 | 1 | 0 |
| 14998 | 0 | 1 | 0 |
14999 rows × 3 columns
df_with_dummies = pd.concat([dependentDf,salary_dummies], axis='columns')
df_with_dummies.head()
| satisfaction_level | average_montly_hours | promotion_last_5years | salary | salary_high | salary_low | salary_medium | |
|---|---|---|---|---|---|---|---|
| 0 | 0.38 | 157 | 0 | low | 0 | 1 | 0 |
| 1 | 0.80 | 262 | 0 | medium | 0 | 0 | 1 |
| 2 | 0.11 | 272 | 0 | medium | 0 | 0 | 1 |
| 3 | 0.72 | 223 | 0 | low | 0 | 1 | 0 |
| 4 | 0.37 | 159 | 0 | low | 0 | 1 | 0 |
df_with_dummies.drop('salary', axis='columns', inplace=True)
df_with_dummies.head()
| satisfaction_level | average_montly_hours | promotion_last_5years | salary_high | salary_low | salary_medium | |
|---|---|---|---|---|---|---|
| 0 | 0.38 | 157 | 0 | 0 | 1 | 0 |
| 1 | 0.80 | 262 | 0 | 0 | 0 | 1 |
| 2 | 0.11 | 272 | 0 | 0 | 0 | 1 |
| 3 | 0.72 | 223 | 0 | 0 | 1 | 0 |
| 4 | 0.37 | 159 | 0 | 0 | 1 | 0 |
y = df.left
y.head()
0 1 1 1 2 1 3 1 4 1 Name: left, dtype: int64
df['left'].value_counts()
0 11428 1 3571 Name: left, dtype: int64
sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x223ce4bf3a0>
cols = ['satisfaction_level', 'last_evaluation', 'number_project',
'average_montly_hours', 'time_spend_company', 'Work_accident', 'left',
'promotion_last_5years']
fig = plt.figure(figsize=(20,12))
corr = df[cols].corr()
sns.heatmap(corr,cbar= True, annot = True, fmt = '.2f', annot_kws = {'size':10}, yticklabels =cols, xticklabels = cols )
plt.show()
#Logistics Regression model
df1 = df[['salary','satisfaction_level',
'average_montly_hours',
'promotion_last_5years','left']]
df1
| salary | satisfaction_level | average_montly_hours | promotion_last_5years | left | |
|---|---|---|---|---|---|
| 0 | low | 0.38 | 157 | 0 | 1 |
| 1 | medium | 0.80 | 262 | 0 | 1 |
| 2 | medium | 0.11 | 272 | 0 | 1 |
| 3 | low | 0.72 | 223 | 0 | 1 |
| 4 | low | 0.37 | 159 | 0 | 1 |
| ... | ... | ... | ... | ... | ... |
| 14994 | low | 0.40 | 151 | 0 | 1 |
| 14995 | low | 0.37 | 160 | 0 | 1 |
| 14996 | low | 0.37 | 143 | 0 | 1 |
| 14997 | low | 0.11 | 280 | 0 | 1 |
| 14998 | low | 0.37 | 158 | 0 | 1 |
14999 rows × 5 columns
dummies = pd.get_dummies(df1.salary)
dummies
| high | low | medium | |
|---|---|---|---|
| 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 |
| 3 | 0 | 1 | 0 |
| 4 | 0 | 1 | 0 |
| ... | ... | ... | ... |
| 14994 | 0 | 1 | 0 |
| 14995 | 0 | 1 | 0 |
| 14996 | 0 | 1 | 0 |
| 14997 | 0 | 1 | 0 |
| 14998 | 0 | 1 | 0 |
14999 rows × 3 columns
Use pandas.concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another. When you use concat() on columns it performs the join operation. data = [df, df1] df2 = pd.concat(data)
df1 = pd.concat([df1,dummies],axis = 'columns')
df1
| salary | satisfaction_level | average_montly_hours | promotion_last_5years | left | high | low | medium | |
|---|---|---|---|---|---|---|---|---|
| 0 | low | 0.38 | 157 | 0 | 1 | 0 | 1 | 0 |
| 1 | medium | 0.80 | 262 | 0 | 1 | 0 | 0 | 1 |
| 2 | medium | 0.11 | 272 | 0 | 1 | 0 | 0 | 1 |
| 3 | low | 0.72 | 223 | 0 | 1 | 0 | 1 | 0 |
| 4 | low | 0.37 | 159 | 0 | 1 | 0 | 1 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 14994 | low | 0.40 | 151 | 0 | 1 | 0 | 1 | 0 |
| 14995 | low | 0.37 | 160 | 0 | 1 | 0 | 1 | 0 |
| 14996 | low | 0.37 | 143 | 0 | 1 | 0 | 1 | 0 |
| 14997 | low | 0.11 | 280 | 0 | 1 | 0 | 1 | 0 |
| 14998 | low | 0.37 | 158 | 0 | 1 | 0 | 1 | 0 |
14999 rows × 8 columns
By using pandas.DataFrame.drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns. By default, pandas return a copy DataFrame after deleting rows, use inpalce=True to remove from existing referring DataFrame.
df1 = df1.drop(['salary','medium'],axis='columns')
df1
| satisfaction_level | average_montly_hours | promotion_last_5years | left | high | low | |
|---|---|---|---|---|---|---|
| 0 | 0.38 | 157 | 0 | 1 | 0 | 1 |
| 1 | 0.80 | 262 | 0 | 1 | 0 | 0 |
| 2 | 0.11 | 272 | 0 | 1 | 0 | 0 |
| 3 | 0.72 | 223 | 0 | 1 | 0 | 1 |
| 4 | 0.37 | 159 | 0 | 1 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... |
| 14994 | 0.40 | 151 | 0 | 1 | 0 | 1 |
| 14995 | 0.37 | 160 | 0 | 1 | 0 | 1 |
| 14996 | 0.37 | 143 | 0 | 1 | 0 | 1 |
| 14997 | 0.11 | 280 | 0 | 1 | 0 | 1 |
| 14998 | 0.37 | 158 | 0 | 1 | 0 | 1 |
14999 rows × 6 columns
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| satisfaction_level | 14999.0 | 0.612834 | 0.248631 | 0.09 | 0.44 | 0.64 | 0.82 | 1.0 |
| last_evaluation | 14999.0 | 0.716102 | 0.171169 | 0.36 | 0.56 | 0.72 | 0.87 | 1.0 |
| number_project | 14999.0 | 3.803054 | 1.232592 | 2.00 | 3.00 | 4.00 | 5.00 | 7.0 |
| average_montly_hours | 14999.0 | 201.050337 | 49.943099 | 96.00 | 156.00 | 200.00 | 245.00 | 310.0 |
| time_spend_company | 14999.0 | 3.498233 | 1.460136 | 2.00 | 3.00 | 3.00 | 4.00 | 10.0 |
| Work_accident | 14999.0 | 0.144610 | 0.351719 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 |
| left | 14999.0 | 0.238083 | 0.425924 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 |
| promotion_last_5years | 14999.0 | 0.021268 | 0.144281 | 0.00 | 0.00 | 0.00 | 0.00 | 1.0 |
df.hist(bins=30, figsize=(20,20), color='g');
pd.crosstab(df.salary, df.left).plot(kind='bar', figsize=(15,7))
<AxesSubplot:xlabel='salary'>
df.groupby('salary').mean()
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | |
|---|---|---|---|---|---|---|---|---|
| salary | ||||||||
| high | 0.637470 | 0.704325 | 3.767179 | 199.867421 | 3.692805 | 0.155214 | 0.066289 | 0.058205 |
| low | 0.600753 | 0.717017 | 3.799891 | 200.996583 | 3.438218 | 0.142154 | 0.296884 | 0.009021 |
| medium | 0.621817 | 0.717322 | 3.813528 | 201.338349 | 3.529010 | 0.145361 | 0.204313 | 0.028079 |
With the help of EDA we come to know that this is a classification problem in which we have independent and dependent variables, based on our independednt variables we will classifing or predicting our output. Assigning "X" as Independent Variable and "y" as Dependent or Target Variable
X = df.drop(['left'], axis=1)
y = df.left
df.head()
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | Department | salary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.38 | 0.53 | 2 | 157 | 3 | 0 | 1 | 0 | sales | low |
| 1 | 0.80 | 0.86 | 5 | 262 | 6 | 0 | 1 | 0 | sales | medium |
| 2 | 0.11 | 0.88 | 7 | 272 | 4 | 0 | 1 | 0 | sales | medium |
| 3 | 0.72 | 0.87 | 5 | 223 | 5 | 0 | 1 | 0 | sales | low |
| 4 | 0.37 | 0.52 | 2 | 159 | 3 | 0 | 1 | 0 | sales | low |
y.head()
0 1 1 1 2 1 3 1 4 1 Name: left, dtype: int64
sns.pairplot(data=df, hue='left')
<seaborn.axisgrid.PairGrid at 0x223d5603670>
sns.heatmap(X.corr(),
xticklabels=X.columns,
yticklabels=X.columns)
<AxesSubplot:>
The features with highest correlations are 'last_evaluation', 'number_project', 'average_montly_hours', and 'time_spend_company'.
We see from the correlation heatmap that the correlation of the target with all the features is low. Moreover, the pairwise distributions indicate that the linear model might not perform well.
df.left.value_counts()
0 11428 1 3571 Name: left, dtype: int64
Here, you can see out of 15000 approx 3571 were left and 11428 stayed. The no of employees left is 23 % of the total employment.
sns.factorplot(x='number_project', y='last_evaluation', data=df)
<seaborn.axisgrid.FacetGrid at 0x223d4212040>
sns.boxplot(x='number_project', y='satisfaction_level', data=df)
<AxesSubplot:xlabel='number_project', ylabel='satisfaction_level'>
df
| satisfaction_level | last_evaluation | number_project | average_montly_hours | time_spend_company | Work_accident | left | promotion_last_5years | Department | salary | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.38 | 0.53 | 2 | 157 | 3 | 0 | 1 | 0 | sales | low |
| 1 | 0.80 | 0.86 | 5 | 262 | 6 | 0 | 1 | 0 | sales | medium |
| 2 | 0.11 | 0.88 | 7 | 272 | 4 | 0 | 1 | 0 | sales | medium |
| 3 | 0.72 | 0.87 | 5 | 223 | 5 | 0 | 1 | 0 | sales | low |
| 4 | 0.37 | 0.52 | 2 | 159 | 3 | 0 | 1 | 0 | sales | low |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 14994 | 0.40 | 0.57 | 2 | 151 | 3 | 0 | 1 | 0 | support | low |
| 14995 | 0.37 | 0.48 | 2 | 160 | 3 | 0 | 1 | 0 | support | low |
| 14996 | 0.37 | 0.53 | 2 | 143 | 3 | 0 | 1 | 0 | support | low |
| 14997 | 0.11 | 0.96 | 6 | 280 | 4 | 0 | 1 | 0 | support | low |
| 14998 | 0.37 | 0.52 | 2 | 158 | 3 | 0 | 1 | 0 | support | low |
14999 rows × 10 columns
accidentplot = plt.figure(figsize=(10,6))
accidentplotax = accidentplot.add_axes([0,0,1,1])
accidentplotax = sns.violinplot(x='Department', y='average_montly_hours', hue='Work_accident', split=True, data = df, jitter = 0.47)
Let's model the data with a decision tree.
# We now use model_selection instead of cross_validation
from sklearn.model_selection import train_test_split
X = df.drop('left', axis=1)
y = df['left']
X_train, X_test, y_train, y_test, = train_test_split(X, y, test_size = 0.3, random_state = 47)
df.dtypes
satisfaction_level float64 last_evaluation float64 number_project int64 average_montly_hours int64 time_spend_company int64 Work_accident int64 left int64 promotion_last_5years int64 Department object salary object dtype: object
# Employee distri
# Types of colors
color_types = ['#78C850','#F08030','#6890F0','#A8B820','#A8A878','#A040A0','#F8D030',
'#E0C068','#EE99AC','#C03028','#F85888','#B8A038','#705898','#98D8D8','#7038F8']
# Count Plot (a.k.a. Bar Plot)
sns.countplot(x='Department', data=df, palette=color_types).set_title('Employee Department Distribution');
# Rotate x-labels
plt.xticks(rotation=-45)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), [Text(0, 0, 'sales'), Text(1, 0, 'accounting'), Text(2, 0, 'hr'), Text(3, 0, 'technical'), Text(4, 0, 'support'), Text(5, 0, 'management'), Text(6, 0, 'IT'), Text(7, 0, 'product_mng'), Text(8, 0, 'marketing'), Text(9, 0, 'RandD')])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df1[['satisfaction_level','average_montly_hours','promotion_last_5years','high','low']],df1.left)
model = LogisticRegression()
model.fit(X_train,y_train)
LogisticRegression()
y_pred=model.predict(X_test)
y_pred
array([0, 0, 0, ..., 0, 0, 0], dtype=int64)
y_test
8874 0
12046 1
2165 0
14677 1
505 1
..
3485 0
2315 0
1284 1
4120 0
7603 0
Name: left, Length: 3750, dtype: int64
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred)
array([[2639, 168],
[ 693, 250]], dtype=int64)
model.score(X_test,y_test)
0.7704
It's definitely worth taking our employees' satisfaction levels more seriously. We've discovered that this is related to, among other things, their salary and the number of projects they have. Further study could lead to finding an optimal combination of salary, number of projects, and other important factors in taking care of our people that could lead to better performance and profits for us and a lower employee mortality rate. It's also worth noting that time spent at the company and employee evaluations also have an important effect on whether employees leave or not -- this could ultimately be connected to their work, so it's worth investigating in more detail how departments handle project delegation to their employees and what kinds of projects they're given, especially given that those from HR and accounting tend to have higher leave rates than the other functions.